A Measure of Similarity between Graph Vertices. with Applications to Synonym Extraction and Web Searching

نویسندگان

  • VINCENT D. BLONDEL
  • PAUL VAN DOOREN
چکیده

Abstract. We introduce a concept of similarity between vertices of directed graphs. Let GA and GB be two directed graphs with respectively nA and nB vertices. We define a nA × nB similarity matrix S whose real entry sij expresses how similar vertex i (in GA) is to vertex j (in GB) : we say that sij is their similarity score. In the special case where GA = GB = G, the score sij is the similarity score between the vertices i and j of G and the square similarity matrix S is the self-similarity matrix of the graph G. We point out that Kleinberg’s “hub and authority” method to identify webpages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant vector of a non-negative matrix and we propose a simple iterative method to compute them. Potential applications of our similarity concept are manifold and we illustrate one application for the automatic extraction of synonyms in a monolingual dictionary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching

We introduce a concept of similarity between vertices of directed graphs. Let GA and GB be two directed graphs with, respectively, nA and nB vertices. We define an nB ×nA similarity matrix S whose real entry sij expresses how similar vertex j (in GA) is to vertex i (in GB): we say that sij is their similarity score. The similarity matrix can be obtained as the limit of the normalized even itera...

متن کامل

COSPECTRALITY MEASURES OF GRAPHS WITH AT MOST SIX VERTICES

Cospectrality of two graphs measures the differences between the ordered spectrum of these graphs in various ways. Actually, the origin of this concept came back to Richard Brualdi's problems that are proposed in cite{braldi}: Let $G_n$ and $G'_n$ be two nonisomorphic simple graphs on $n$ vertices with spectra$$lambda_1 geq lambda_2 geq cdots geq lambda_n ;;;text{and};;; lambda'_1 geq lambda'_2...

متن کامل

Automatic extraction of synonyms in a dictionary

We propose a method for automatic synonym extraction in a dictionary. Our method is based on an algorithm that computes similarity measures between vertices in graphs. This algorithm can be thought of as a generalization of Kleinberg’s web search algorithm to structure graphs that are more general than the hub-authority graph used by Kleinberg. We use the 1913 Webster’s Dictionary and apply our...

متن کامل

A Geometric View of Similarity Measures in Data Mining

The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...

متن کامل

Automatic Discovery of Similar Words

We deal with the issue of automatic discovery of similar words (synonyms and near-synonyms) from different kind of sources: from large corpora of documents, from the Web, and from monolingual dictionaries. We present in detail three algorithms that extract similar words from a large corpus of documents and consider the specific case of the World Wide Web. We then describe a recent method of aut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004